Language Identification in Document Images

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Identification in Document Images

This paper presents a system dedicated to automatic language identification of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text and various layouts. To handle such a problem, we propose a system that performs the following sub-tasks: writing type identification (printed/handwritten), script identification and l...

متن کامل

Script and Language Identification in Degraded and Distorted Document Images

This paper reports a statistical identification technique that differentiates scripts and languages in degraded and distorted document images. We identify scripts and languages through document vectorization, which transforms each document image into an electronic document vector that characterizes the shape and frequency of the contained character and word images. We first identify scripts bas...

متن کامل

Language identification in Complex, Unoriented, and Degraded Document Images

We describe algorithms for identifying the language of text in document images which are complex, unoriented, and degraded. We distinguish among seven lan-page layouts may be complex, containing text blocks in unknown roughly Manhat-tan arrangements. The pages may be unoriented, that is, upright or rotated by 90, 180, or 270 degrees. The images may be degraded by digitization at coarse and uneq...

متن کامل

Language Identification in Degraded and Distorted Document Images

This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, char...

متن کامل

Techniques for Language Identification for Hybrid Arabic-English Document Images

Because of the different characteristics of Arabic language and Romance and Anglo Saxon languages, recognition of documents written in hybrid of these languages requires that the language of the text to be identified priori to the recognition phase. In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and text written in English script...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronic Imaging

سال: 2016

ISSN: 2470-1173

DOI: 10.2352/issn.2470-1173.2016.17.drr-058